[SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates#18629
[SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates#18629tdas wants to merge 2 commits intoapache:masterfrom
Conversation
| import org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite._ | ||
|
|
||
| class StreamingQueryStatusAndProgressSuite extends StreamTest with Eventually { | ||
| implicit class EqualsIgnoreCRLF(source: String) { |
There was a problem hiding this comment.
Replaces this with assertJson (see below) because this code made it harded to debug the differences between json strings. With assertJson, scala test would show nice diffs.
[info] - StreamingQueryProgress - prettyJson *** FAILED *** (137 milliseconds)
[info] "..."numRowsUpdated" : 1[,]
[info] "memoryUsedByte..." did not equal "..."numRowsUpdated" : 1[]
[info] "memoryUsedByte..." (StreamingQueryStatusAndProgressSuite.scala:213)
[info] org.scalatest.exceptions.TestFailedException:
|
Test build #79595 has finished for PR 18629 at commit
|
|
Test build #79610 has finished for PR 18629 at commit
|
|
|
||
| override def numKeys(): Long = mapToUpdate.size() | ||
| override def metrics: StateStoreMetrics = { | ||
| StateStoreMetrics(mapToUpdate.size(), SizeEstimator.estimate(mapToUpdate), Map.empty) |
There was a problem hiding this comment.
Should we add a flag for this? SizeEstimator.estimate will be very slow when there are a lot of states, because it scans all objects using reflection.
There was a problem hiding this comment.
Let me do some tests to understand how long it will take. For arrays it will just sample, so it should not take that long.
There was a problem hiding this comment.
Takes < 0.5 ms with a state store with 5 million elements
|
LGTM. Merging to master. |
What changes were proposed in this pull request?
Currently, there is no tracking of memory usage of state stores. This JIRA is to expose that through SQL metrics and StreamingQueryProgress.
Additionally, added the ability to expose implementation-specific metrics through the StateStore APIs to the SQLMetrics.
How was this patch tested?
Added unit tests.